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Large-scale data from social media have a significant potential to describe complex phenomena 
in real world and to anticipate collective behaviors such as information spreading and social trends. 

One specific case of study is represented by the collective attention to the action of political parties. 

Not surprisingly, researchers and stakeholders tried to correlate parties’ presence on social media 
with their performances in elections. Despite the many efforts, results are still inconclusive since 
this kind of data is often very noisy and significant signals could be covered by (largely unknown) 
statistical fluctuations. In this paper we consider the number of tweets (tweet volume) of a party 
as a proxy of collective attention to the party, identify the dynamics of the volume, and show that 
this quantity has some information on the elections outcome. We find that the distribution of the 
tweet volume for each party follows a log-normal distribution with a positive autocorrelation of 
the volume over short terms, which indicates the volume has large fluctuations of the log-normal 
distribution yet with a short-term tendency. Furthermore, by measuring the ratio of two consecutive 
daily tweet volumes, we find that the evolution of the daily volume of a party can be described by 
means of a geometric Brownian motion (i.e., the logarithm of the volume moves randomly with a 
trend). Finally, we determine the optimal period of averaging tweet volume for reducing fluctuations 
and extracting short-term tendencies. We conclude that the tweet volume is a good indicator of 
parties’ success in the elections when considered over an optimal time window. Our study identifies 
the statistical nature of collective attention to political issues and sheds light on how to model the 
dynamics of collective attention in social media. 


Introduction 


As social animals, since a long time ago, humans have 
communicated, exchanged opinions, and tried to recon¬ 
cile their conflicts by means of social instruments. De¬ 
spite their recent introduction, social media and web- 
based services such as Google, Twitter, Facebook, and 
Wikipedia have already dramatically changed the way 
in which people make relationships, interact with oth¬ 
ers, and acquire information. Differently from the past, 
such activities help people to overcome the physical and 
geographical limitations of human interactions. 

When people use social media and web services, a huge 
amount of digital “footprints” (i.e., data) are created and 
simultaneously recorded. These “footprints” can provide 
us novel opportunities to observe collective behaviors at 
unprecedented scales. For this reason, the data are gen¬ 
erally regarded as crucial instruments in order to under¬ 
stand the complex and collective behaviors in our social 
and technological systems [H-Q- Despite the recent ap¬ 
pearance of these computer-based social media, there is 
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already a large number of studies describing and fore¬ 
casting collective behaviors emerging from them. For 
example, large scale network analysis based on Twitter 
and Facebook data have revealed the structure of social 
networks of tens of millions of people in. Twitter 
data have been used to identify spreading patterns of 
popular information Si, classes of dynamical collec¬ 
tive attention M , linguistic usa ge p atterns on worldwide 
scale [m, and political activity |T2l - [l3| . From Facebook 
data it has been possible to distinguish difference in con- 
sumption patterns between science and conspiracy infor¬ 
mation [^. Further cross-cultural differences in evalua¬ 
tion of historical figures were identified based on multi¬ 
lingual Wikipedia data Em, and social media usage 
patterns are used to find out unemployment in local re¬ 
gions [3. Finally, users’ query logs on search engines 
help to anticipate the spreading of flu M or dynamics 
of stock market Emi, and Wikipedia activity data was 
used to predict movies’ box office [ 22 1 . 

Predictions of elections based on social media data 
have various advantages with respect to other methods 
(such as traditional opinion polls). Firstly, we deal with 
large scale samples, secondly, the flow of data is such that 
we can get real time responses, and finally, we have low 
costs of data collection. For these reasons, social media 
data received (and probably will receive even more in the 
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future) a great attention by practitioners and scientists. 
The key question will be whether relevant information on 
elections can be extracted from social media data or not. 
It is now known that in certain cases we can have indica¬ 
tions on elections results, but the degree of reliability of 
this method has to be improved 1^. For example, both 
positive (23 - l?7j| and null relations [28l between social 
media activity and election outcomes have been observed 
so far. In order to improve this method of forecast, some 
scientists suggest to complement tweet volume analysis 
with sentiment analysis of tweets, i.e., identification of 
positive or negative sentiment [13, Nevertheless, re¬ 
liable methods of sentiment analysis for political tweets 
are still lacking M- Intuitively, mentions of political 
parties or politicians in social media can be considered 
as expressions of people’s attention to them. However, 
there is no guarantee that all of the mentions in social 
media correspond to the supports for the parties in elec¬ 
tions. People post tweets on political parties and politi¬ 
cians for various reasons, such as expressions of support, 
disappointment, or sarcasm. In other words, dynamics 
of tweet activity can be driven not only by popularity of 
parties or politicians but also by other reasons. There¬ 
fore it is necessary to understand dynamics of collective 
attention to political parties or politicians in social me¬ 
dia, since such understanding will be a cornerstone to 
separate the “signal” from the “noise” in the dynamics 
of collective attention in social media. 

In this paper we consider tweet volumes about polit¬ 
ical parties as proxies of collective attention to the par¬ 
ties and by investigating the dynamics of tweet volumes 
we try to assess their relation (and forecasting power) 
with the final results of elections. For such purposes, we 
identify dynamical and statistical characteristics of daily 
tweet volumes of political parties during election peri¬ 
ods. We find that the distributions of daily tweet volume 
of each political party is in good agreement with log¬ 
normal distribution|32| . This observation indicates that 
the average behavior of daily tweet volume may have 
some information, yet large fluctuations can be behind 
the average. Thus the prediction based on too short¬ 
term Twitter data may not be consistent. On the other 
hand, we observed positive autocorrelation of daily tweet 
volume of each party in short term. This means the time 
series of daily tweet volume largely depends on the previ¬ 
ous activity (i.e., the existence of short-term tendency). 
Thus, averaging over too long-term periods can destroy 
the signal. We also measure that the distribution of the 
logarithmic ratio of two consecutive daily tweet volumes 
for each party follows a normal distribution and the ra¬ 
tio is independent of time. These two observations allow 
us to describe properly the dynamics of daily tweet vol¬ 
ume as a geometric Brownian motion[^. In the end, we 
checked whether there is an optimal period of averaging 
tweet volumes which not only reduce the fluctuation but 
also keep the short term tendency of tweet volumes. Our 
analysis suggests what really tweet volume of each polit¬ 
ical party means in a quantitative way and sheds light on 


how we can separate the noise and the signal for better 
prediction using social media data. 

Materials and Methods 
Data description 

In this paper, we consider data collected on Twitter 
(twitter.com), a microblogging platform used by millions 
of bloggers. In Twitter, each user can freely post short 
messages (up to 140 characters) called “tweets” to its 
followers. Twitter provides application programming in¬ 
terfaces (APIs) to access tweets and information about 
tweets and users. The potential bias of Twitter APIs 
was discussed by a recent research [s^. We mainly con¬ 
sider daily tweet volume Vp{t) of a given political party 
p at day t. To identify dynamics of daily tweet volume 
of political parties in Twitter, we consider three elec¬ 
tions in two European countries: European Parliament 
election of 2014 in Italy (Eurol4), Italian general elec¬ 
tion of 2013 (ItalylS), and Bulgarian general election of 
2013 (BulgarialS). By using Twitter API, we collected 
general tweets around election days and then considered 
only tweets posted in local languages (i.e., Italian or Bul¬ 
garian) from the starting day of data collection to the 
day before the election day. We used the implemented 
automatic language detection system of Twitter to iden¬ 
tify the language of tweets. For the Bulgarian case, the 
Twitter language detection mechanism often did not dis¬ 
tinguish between Bulgarian and Macedonian, which are 
very similar. We therefore implemented our own lan¬ 
guage detection, based on a Bayesian classifier, trained 
on a large corpus of over five million words for each lan¬ 
guage. Here one day is defined as a time window from 
00:00:00 to 23:59:59 of the day in local time for the Ital¬ 
ian cases and Greenwich Mean Time for the Bulgarian 
case. For the cases of election in Italy (i.e., Eurol4 and 
Italy 13), we define the number of tweets Vp{t) for a given 
political party p as the number of tweets mentioning the 
leaders’ names (only family names) of political parties p 
or the leaders’ twitter accounts at the day t. This is be¬ 
cause, in Italian cases, the names of leaders are widely 
used to represent the political parties . The overview 
summary of three data sets are represented in Table HI 

• European Parliament election of 2014 in Italy 
(Eurol4)- We collected 12,535,469 tweets posted 
between 21 April 2014 and 12 June 2014 in to¬ 
tal. Of this sample, we extracted 3,413,214 Italian 
tweets between 22 April 2014 and 23 May 2014. 
The election day was 24 May 2014 [s^. 

• Italian general election of 2013 (Italy 13): We col¬ 
lected 7,755,063 tweets posted between 11 Novem¬ 
ber 2012 to 3 March 2013 in total. Of this sample, 
we extracted 3,796,754 Italian tweets from 1 Jan¬ 
uary 2013 to 22 February 2013. The election days 
were 23 and 24 February 2013 [s^. 
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• Bulgarian general election of 2013 (BulgarialS): 
The raw tweet data is based on collected 16,077 
tweets posted between 29 April 2013 to 27 May 
2013 in total [l^- Out of this sample, we extracted 
5,817 tweets from 29 April to 11 May 2013. The 
election day was 12 May 2013 [? ]. In this case we 
consider both, the names of political parties and 
the names of their leaders. The retrieval of the Bul¬ 
garian tweets was performed by the Gama System 
company (http://www.gama-system.si/en/) and 
their Gama System® PerceptionAnalytics platform 
(http: / / demo. perceptionanalytics. net) 

Detailed information on each party in each election is 
given in Table HIl 


Geometric Brownian motion 

Defining a geometric Brownian motion for the daily 
tweet volume Vp{t) (for a party p) means that Vr,(t) satis- 
fies the following stochastic differential equations 

dVp{t) = pVp{t)dt + aVp{t)dWt (1) 

where Wt is Wiener process or Brownian motion, and 
p, and a are constants. In particular, p represents the 
“drift” (i.e., trend) and a represents the “volatility” (i.e., 
random noise) of Vp{t). Eq. [T] has an analytic solution 
under Ito’s interpretation |39| as following: 

2 

Vp{t) = Vp{0)exp{{p - y )t -h aWt) (2) 

where 1^(0) is the initial value. 

Taking logarithm of both sides of Eq. [21 we get: 

2 

log{Vp{f)) = log{Vp{Q)) + {p- y )t -k aWt (3) 

Since (IE(<)) = 0, the expectation value of log{Vp{t)) 
is given in the following equation: 

(logiVpit))) = logiVpiO)) + {p - y )t (4) 

Results 

The main results of this paper are summarized as fol¬ 
lows. (i) We find that the daily tweet volumes of political 
parties before elections follow log-normal distributions 
and have positive autocorrelations over short terms, (ii) 
The daily volume evolution can be described by means 
of geometric Brownian motion, (iii) If we want to con¬ 
sider the average behavior of daily tweet volume, it is 
necessary to consider long enough period for reducing 
statistical fluctuations, but not too long, to not destroy 
short-term memories with relevant information. 


Indication from tweet volumes 

We consider dynamics of daily tweet volumes of polit¬ 
ical parties in three elections {EuroIf, Italy 13, and Bul- 
garial3) based on the Twitter data collected as described 
in the Method section. The time series of daily tweet 
volume Vp{t) of a political party p, before and after each 
election day, are represented in Fig. [T] Sharp peaks of 
daily tweet volumes of parties on the election days and 
on the day after election days suggest the daily tweet vol¬ 
umes reflect the attentions of the public to the elections. 
On the other hand, other notable peaks are also observed 
much earlier than the election days, which indicate the 
daily tweet volumes may be activated by other reasons 
than election issues, such as scandals of politicians, their 
appearances in the press or mass media, or other political 
activities [40| . 

For these three election cases, we want to check if we 
can get an indication on the election outcomes simply 
considering daily tweet volume of parties or its simple 
functions as reported in some studies [23 - 1^ . As shown 
in Fig. [1] the daily tweet volume for each party shows dif¬ 
ferent prediction power for election outcomes depending 
on elections. The ordering of parties in Fig. [I] is deter¬ 
mined by actual rankings based on number of votes in 
the elections (See Table [TTl)- In the case of Bulgarial3 
(Fig. dKG)), during the whole observation period, rank¬ 
ings by the daily tweet volumes are the same as the actual 
election outcome. In the case of Eurolf (Fig. (TJA)), for 
most of observation days, daily tweet volume predicted 
well the election outcome. In the Italyl3 case (Fig.jTKB)), 
the prediction is less effective than the other two cases 
especially in early days. In the Italy 13 case, the rank¬ 
ings predicted by analysis change frequently with the day, 
therefore making the forecast not very reliable. However, 
we cannot conclude that this is a failure of the method, 
since it could actually reflect the real dynamics of vot¬ 
ers’ o pin ions. Indeed, according to the opinion polls in 
Italy [^, MSS had low support from the public in the 
early period of the campaign. Also it is notable that 
the Italy 13 case is a typical ‘too close to call’ case (See 
Table in for the actual number of votes) to evaluate the 
prediction power. 


Description of fluctuations in tweet volumes 

The observed fluctuations in daily tweet volumes can 
distort not only prediction of parties’ rankings in elec¬ 
tions but also the prediction on parties actual votes in 
the elections. While it seems possible to forecast rank¬ 
ings in some elections there is still some work to be done 
to anticipate the number of actual votes. Indeed, de¬ 
pending on the observation period, the prediction of the 
number of votes varied because strong fluctuations exist 
in daily tweets volumes for each party. Similar behaviors 
were also observed previously [ 23 , [2^ . 

If the daily tweet volumes of parties show strong flue- 
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tuations, it is necessary at least to describe the statistical 
patterns of the evolution of this quantity. To this aim, 
we consider distributions of daily tweet volume Vp for the 
given time interval from the initial day of data collection 
to the day before the elections. From visual inspection, 
this quantity seem to follow a fat-tailed like “log-normal” 
distributions (Fig.[2{A), (C), and (E)). Due to the small 
number of data samples, we represented the cumulative 
distribution functions. To determine whether the daily 
tweet volumes follows or not lo g-n ormal, we consider Q- 
Q plot (quantile-quantile plot) [88| of logarithm of Vp as 
shown in Fig. [5JB), (D), and (F). 

Note that if the points in the Q-Q plot are close to 
y = X line, the data is more likely to follow the theoret¬ 
ical distribution (i.e., normal distribution in this case). 
As shown in Fig. [21(6), (D), and (F), in most of the cases 
we can conclude that the daily tweet volumes follow log¬ 
normal distributions since logarithms of the volumes fol¬ 
low normal distributions as shown in the Q-Q plots. Such 
fat-tailed shape means that even if the daily tweet vol¬ 
ume may provide relevant information on the dynamics 
of collective attention to political issues, this information 
can be largely hidden by statistical fluctuations. Thus, 
in spite of some prediction power, it is not easy to predict 
the election outcome very accurately beyond the rankings 
due to the fluctuations. 

We then checked whether the dynamics of the daily 
tweet volumes Vp can be described by a constant volume 
with fluctuations, or if there exist higher orders in the 
dynamics. First, in order to check if the daily tweet vol¬ 
umes can be described as a constant volume term with 
a noise volume term, we consider autocorrelation Rp of 
the daily tweet volume Vp{t) for each party p. If we can 
consider Vp{t) = Vq + Et, where Vq is a constant and Et 
is an error term, Vp{t) will move around Vq as a random 
signal without any short or long term tendency. In this 
case, autocorrelation of Vp will be zero. The autocorre¬ 
lation measures how similar is the original time series of 
a variable to the lagged time series of the variable. We 
can measure autocorrelation i?p(r) of daily tweet volume 
for a party p with a lagged time r by the Pearson’s coef¬ 
ficient between original tweet volume from day t = 0 to 
t = te — 1 — T and the same tweet volume from day t = t 
to t = te ~ 1 for a given party p and r: 


Rpir) 


1 


te - T 


E 


i=0 


{Vpit)-{V)){Vp{t + T)-{V')) 


( 5 ) 

Here, (V) ((V')) is the average daily tweet volume for 
party p from day t = 0 (t = r) to day t = tg — 1 — t 
( t = tf. — 1), ap (u'p) is the standard deviation, and te is 
the election day. Thus Rpir) quantifies the correlation 
between original time series of daily tweet volume Vp(t) 
with T day-lagged time series Vp{t + t) of original daily 
tweet volume. If Rpir) = 1, the time series has strongly 
increasing or decreasing tendency with period of r. If 
i?p(r) = —1, the time series shows ‘up and down’ or 
zigzag pattern with period of r. If i?p(l) ~ 0.0, then we 


can consider Vp{t) such that Vp{t) = Vo + Et where Vq is 
a constant and Et is an error (or noise) term as described 
above. As shown in Fig.|3J we observed positive autocor¬ 
relations Rp{l) > 0.2 for all of the cases. This means 
the daily tweet volume for parties have some ‘increasing’ 
or ‘decreasing’ patterns for some time intervals and can¬ 
not be described by a simple constant plus error model. 
However, i?p(r > 2) rs 0 in some cases. In these cases 
the tendency do not last long. While Rp{T > 2) > 0.4 
for M5S and AET in Eurol4 (Fig. |3KA)), for M5S in 
Ralyl3 (Fig. |3KB)), and for DPS and ATAKA in Bul- 
garialS (Fig. EKC)). These cases show more persistent 
tendency. 


A model of fluctuations in tweet volume 

The observed log-normal distributions of daily tweet 
volumes for parties suggest that its underlying dynam¬ 
ics can be described by a geometric Brownian motion 
(GBM) [s^. This means that the logarithm of the vari¬ 
able follows a Brownian motion with a drift, a situation 
that often describes the dynamics of company prices in 
stock markets [s^. 

To verify this assumption we need to check if the log¬ 
arithmic ratio Tpit) = log{Vp{t + \)/Vp{t)) follows a nor¬ 
mal distribution and if the same ratio is independent of 
time m, 

Regarding the first point, we show in Fig. SKA), (C), 
and (E) the cumulative distribution functions of r for 
every party. To confirm that they are indeed normally 
distributed, we consider the Q-Q plots for each party as 
shown in Fig.SKB), (D), and (F) (as described in Fig.S]). 
The Q-Q plots strongly support the normality of the loga¬ 
rithmic ratio rp{t) (the points approximately lie ony = x 
line). As for the second point we consider the scatter 
plots of the logarithmic ratio rp{t) = log{Vp{t + l)/Vp{t)) 
as shown in Fig. [5] From Fig. [5] we can see that the ratio 
rp{t) for every party is independent of time t. 

By fulfilling the above hypotheses, we can consider 
Eq. SI as a GBM model for dynamics of Vp{t). By lin¬ 
ear fitting of the data with Eq. SI we can determine the 
value of /r — ^ and log{Vp{0)). Then we get the value of 
a from the fluctuations between the data and the GBM 
model. The obtained values of /r, cr, and Vq are repre¬ 
sented in Table m 

Fig.|n|shows the dynamics of Vp{t) for each party p (red 
lines) and the corresponding GBM model Vp{0)exp{{p — 
^)t) (blue dashed lines). As guidelines, GBM-|-cr model 
Vp{t))exp{{pL— ^)t+a) (green dashed lines) and GBM—cr 
model Vp{0)exp{{p— ^)t—a) (cyan dashed lines) are also 
represented in Fig. SI Indeed, the GBM model describes 
well the dynamics of daily tweet volume in the data as 
shown in Fig. [6] although there are some large spikes, 
which are beyond the GBM-|-cr model, in the dynamics. 
Also the obtained values of p and cr explain the observed 
strong autocorrelations of daily tweet volumes. For ex- 
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ample, MSS in Eurol4 and Italy 13 has relatively high 
/i but low a, thus the dynamics of daily tweet volume 
of MSS in Eurol4 and Italy 13 has relatively strong drift 
with weak fluctuations. This leads the dynamics to high 
autocorrelations in longer term (i.e., a strong tendency 
with low volatility). 

Tweet volumes and election outcomes 

Until now we mainly focused on the dynamical proper¬ 
ties and the modelling of daily tweet volumes of political 
parties in order to describe the properties of data fluctua¬ 
tions. Anyhow, the simplest way of reducing fluctuations 
will be averaging out (or cumulating) the daily tweet vol¬ 
umes. However, positive autocorrelation and short-term 
memory of the volumes imply that if we consider too long 
time interval for averaging, we might lose short term in¬ 
creasing or decreasing tendency in the dynamics. In other 
words, if we consider too long period, the recent relevant 
signals from tweet volumes can be hidden by old tweet 
volumes. In addition, if we consider tweet volume in days 
much earlier than the election day, other types of ‘noise’ 
compromise the ‘signal’. Twitter users typically do not 
pay much attention to elections before the campaign ac¬ 
tually starts, even though they may mention “politics” 
in their tweets. Thus it is necessary to find out how long 
time interval has to be considered to get optimal results 
in practical sense. 

To identify the optimal time interval of averaging daily 
tweet volume of a given political party, we consider the 
tweet volume t^(A) of a party p averaged from the day 
before the election to the |A| days before as follows: 

= m E (6) 

Here <e is the election day, A is a negative integer, and 
|A| is the absolute value of A that represents the number 
of days to wait for the election day (i.e., A = —2 means 
two days before the election day). 

Fig. 0 shows the rankings of parties ordered by t^(A) 
for each time interval from the day before the election day 
to the |A| days before the election. For the case of Euro 14 
(Fig. EDA)), until A = —14, we can get the accurate pre¬ 
diction. For the case of Italy 13, the optimal length of 
time interval for accurate prediction will be from A = — 2 
to A = —11. Indeed, M5S performed much better than 
the expectation before the election and the support for 
M5S was rapidly growing during the campaign. This 
pattern is vividly reflected in Fig. EKB). If we consider 
A = —14, then the prediction based tweet volume M5S 
anticipated M5S will be the third thanks to the low sup¬ 
ports for M5S in earlier period of the campaign. On the 
other hand, all considered A show accurate and consis¬ 
tent prediction in the case of Bulgarial3 (Fig. EDO)), as 
expected from Fig. El 


Discussion 

Social media permeate all levels of society rapidly and 
widely. A huge amount of data on collective behaviors 
are being generated from these social media. This phe¬ 
nomenon promotes quantitative analysis of these data, 
with the goal to understand collective behaviors and pre¬ 
dict them in effective and efficient ways. In this paper, we 
analyzed dynamics of daily tweet volumes of political par¬ 
ties on Twitter, when approaching elections, identified 
statistical patterns of the daily tweet volumes of parties, 
and described the dynamics of volume with geometric 
Brownian motion (GBM). We found that the daily tweet 
volume of a given political party follows a broad distri¬ 
bution like log-normal, and has positive autocorrelation 
over a short time period. Finally, we identified there is 
an optimal period of averaging tweet volumes which not 
only reduce the fluctuation but also keep the short term 
tendency of tweet volumes. Our analysis shows that daily 
tweet volumes could have a limited prediction ability of 
election outcomes and that this limitation is caused by 
their strong fluctuations. 

In order to overcome the limited prediction power of 
the daily tweet volume, one needs to understand what 
causes statistical fluctuations of Twitter activity and to 
separate the signal from the noise in tweet volumes. Uni¬ 
versal features of fluctuations with the form of log-normal 
distributions imply that there might be a single underly¬ 
ing mechanism for the fluctuations, such as multiplicative 
processes [S^. In particular, the driving mechanisms of 
peaked activities, which cause large fluctuations, should 
be understood. For instance, Silvio Berlusconi is a pop¬ 
ular figure in Italian politics and society. He therefore 
receives a large number of Twitter mentions not only 
by his supporters but also by his opponents; often these 
mentions are not just about politics but also about his 
private life. For example, on 9 Jan. 2013, a sharp peak of 
FI (i.e., mentioning Berlusconi) in Fig. El was observed. 
From the news on this day we concluded that an Italian 
court fixed the financial consequences of his divorce and 
that he was charged with the accusation of prostitution 
with a minor (at the time of publication of this article 
the trial ended and he was sentenced not guilty). This 
example clearly illustrates that the peaks could stem not 
only from election issues but also from private issues of 
the politicians. This also means that one needs to con¬ 
sider the roles of mass media for daily tweet volumes of 
political parties. All these factors can have significant 
influence on tweet volumes of political parties or politi¬ 
cians. Systemic consideration of these factors can give 
us some hints about the amount of the fluctuations orig¬ 
inating from the endogenous or exogenous mechanisms. 

Expanding the point of view, it would be interesting 
to identify whether the dynamics after the election also 
can be described as a GBM or not. If possible, the GBM 
model for the dynamics after the election might have dif¬ 
ferent drift (p) and volatility (cr) terms in Eq. E] from 
the ones in the current GBM model for the dynamics 
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before the election. Because, as shown in Fig. 1, the 
dynamics of tweet volume typically shows a peak on the 
election day or the day after election and show decreas¬ 
ing patterns hereafter. This implies the drift (i.e., ten¬ 
dency) term of the GBM might be changed after the elec¬ 
tion since the collective attention was moved to other is¬ 
sues. In order to describe the dynamics after election as a 
GBM, it is necessary to test the normality of logarithmic 
ratio of consecutive tweet volume and time-independence 
of the ratio as done in Fig. [Hand Fig. [S] For these tests, 
we need to consider tweets data-set collected after the 
elections. 

Not only single social media but also multiple social 
media can be considered to predict the election out¬ 
come. For instance, Wikipedia and search en gine data 
have been used to forecast elections outcomes [^, and 
sentiment analysis was suggested for reinforcing the fore¬ 
casting performance. Ghecking the validity of combined 
social media data will be one of our future research di¬ 


rections. 

Another interesting problem worth to be considered is 
to determine if the patterns of daily tweet volumes of po¬ 
litical parties (for example, log-normal distribution) have 
universal features. If this is the case, it would be impor¬ 
tant to determine if we observe similar patterns for other 
events. Indeed, broad distributions of tweet volume for 
brand names [i^l and attentions to online items have 
already been reported. Hence, investigation of dynamics 
of tweet volumes of various objects can lead us to check 
universal features of the dynamics. Further research will 
be necessary to determine this point. 

Influence of social media on political and social issues 
is getting greater and greater. Understanding mathemat¬ 
ical nature of dynamics of collective attention to elections 
in social media can enhance our ability to anticipate dy¬ 
namics of collective attention to other political or social 
issues. 
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TABLE I: Description of Twitter data set. Time stamps in Eurol4 and Italy are in local time while time stamps in 
BulgarialS are in Greenwich Mean Time (GMT). There is a three-hours difference between GMT and Bulgarian time. Ti 
represents the initial day of considered data. Te is the election day. T/ represents the final day of considered data. One-day is 
defined a time interval from 00:00:00 to 23:59:59 in considered time. Nt represents the total number of considered tweets for 
given time interval from Ti to Te-1 posted in local language. Np represents the number of considered political parties. 


Data set 

Ti 

Te 

Tf 

Nt 

Language 

Np 

Held in 

EurolA: 

22 Apr. 2014 

25 May 2014 

12 Jun. 2014 

3,413,214 

Italian 

7 

Italy 

ItalylS 

1 Jan. 2013 

23 Feb. 2013 

3 Mar. 2014 

3,796,754 

Italian 

6 

Italy 

BulgarialS 

29 Apr. 2013 

12 May 2013 

27 May 2013 

5,817 

Bulgarian 

4 

Bulgaria 


TABLE II: Description of considered political parties for each election. The official sources of election results are 
provided on \3^( EurolA). ^^{ItalylS), and {Bulgarial3) respectively. 


Eurol4: European Parliament election 2014, Italy 

Rank 

Party 

Actual votes 

Leaders 

1 

Partito Democratico (PD) 

11,203,231 

Matteo Renzi 

2 

MoVimento Cinque Stelle (MSS) 

5,807,362 

Beppe Grille 

3 

Forza Italia (FI) 

4,614,364 

Silvio Berlusconi 

4 

Lega Nord (LN) 

1,688,197 

Matteo Salvini 

5 

Nuovo Centrodestra - Unione di Centro (NCD-UdC) 

1,202,350 

Angelino Alfano, Pier Ferdinando Casini 

6 

L’Altra Europa con Tsipras (AFT) 

1,108,457 

Alexis Tsipras, Nichi Vendola, Paolo Ferrero 

7 

Fratelli d’ltalia - Alleanza Nazionale (Fdl-AN) 

1,006,513 

Giorgia Melon! 

ItalylS: Italian general election 2013 

Rank 

Party 

Actual votes 

Leaders 

1 

MoVimento Cinque Stelle (MSS) 

8,691,406 

Beppe Grille 

2 

Partito Democratico (PD) 

8,646,034 

Pier Luigi Bersani, Matteo Renzi 

3 

11 Popolo della Liberta (PdL) 

7,332,134 

Silvio Berlusconi 

4 

Scelta Civica (SC) 

2,823,842 

Mario Monti 

5 

Lega Nord (LN) 

1,390,534 

Roberto Maroni 

6 

Sinistra Ecologia Liberta (SEL) 

1,089,231 

Nichi Vendola 

BulgarialS: Bulgarian general election 2013 

Rank 

Party 

Actual votes 

Leaders 

1 

GERB 

1,081,605 

Boyko Borisov 

2 

BSP 

942,541 

Sergei Stanishev 

3 

DPS 

400,446 

Lyutvi Mestan 

4 

ATAKA 

258,481 

Volen Siderov 
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TABLE III: Parameters to describe the dynamics of daily tweet volume of political parties as a geometric 
Brownian motion (GBM). The expectation value Vp{t) of daily tweet volume of party p at time t given by a GBM is 
Vp{t) = Vp{0)exp{{fj, — 12)t + aW{t)) where W{t) is a Wiener process or a Brownian motion. 


|Eurol4: European Parliament election 2014, Italy | 

Rank 

Party 

/i — fj^/2 


a 

V^p(O) 

Rank 

Party 

fl — (T^ I2 


(J 

^p(O) 

1 

PD 

0.0124 

0.0627 

0.3171 

18299.2 

5 

NGD-UdC 

0.00S9 

0.0893 

0.4088 

2S78.S 

2 

MSS 

0.0469 

0.092S 

0.3018 

6143.3 

6 

AET 

0.0S81 

0.1S13 

0.4316 

S20.0 

3 

FI 

0.00S3 

0.09SS 

0.4247 

9714.3 

7 

Fdl-AN 

0.0404 

0.4013 

0.8496 

238.S 

4 

LN 

0.0S92 

0.299S 

0.6932 

686.2 







|Italyl3: Italian general election 2013 | 

Rank 

Party 

fi-a^/2 


a 

V^p(O) 

Rank 

Party 

/i - cr 72 


(J 

^"p(0) 

1 

MSS 

0.0328 

0.0979 

0.3608 

3294.9 

4 

SC 

-0.0048 

0.0490 

0.3278 

228S6.7 

2 

PD 

0.0181 

0.081S 

0.3S61 

9121.2 

S 

LN 

0.0104 

0.2264 

0.6S73 

S76.0 

3 

PdL 

0.0039 

0.1164 

0.4744 

16763.S 

6 

SEL 

0.0127 

0.1406 

0.S0S7 

24S8.3 

|Bulgarial3: Bulgarian general election 2013 | 

Rank 

Party 

fj, — cr^/2 


a 

V^p(O) 

Rank 

Party 

— cr^/2 


(J 

^p(O) 

1 

GERB 

0.043S 

0.1S91 

0.4808 

194.8 

3 

DPS 

0.2110 

0.2496 

0.2782 

7.6 

2 

BSP 

0.0892 

0.1904 

0.4498 

SS.8 

4 

ATAKA 

0.2248 

0.4020 

0.S9S4 

2.4 
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Date 



FIG. 1: Daily tweet volume for each party around elections. The ordering of parties (i.e., the numbers in parentheses) 
is based on actual ranking in the election. (A) Eurol4- 1st: PD. 2nd: MSS. 3rd: FI. 4th: LN. 5th: NCD-UdC. 6th: AET. 7th: 
Fdl-AN. (B) ItalylS. 1st: MSS. 2nd: PD. 3rd: PdL. 4th: SC. 5th: LN. 6th: SEL. (C) BulgarialS. 1st: GERB. 2nd: BSP. 3rd: 
DPS. 4th: ATAKA. 
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FIG. 2: Cumulative distribution functions (CDF) of daily tweet volumes (A, C, E) and Q-Q plots of logarithms 
of daily tweet volumes for each political party (B, D, F). Each volume in CDF is normalized by the average (F). (A) 
CDF of daily tweet volume of Eurol4- (B) Q-Q plot of Eurol4- (C) CDF of daily tweet volume of Italyl3. (D) Q-Q plot of 
Italyl3. (E) CDF of daily tweet volume in Bulgarial3. (f) Q-Q plot in Bulgarial3. Note that Q-Q plot is for logarithm of 
daily tweet volume. Theoretical quantile in the Q-Q plot is based on normal distribution. Thus if the points in the Q-Q plot 
he on y = X line, the daily tweet volume follows a log-normal distribution since the logarithm of the volume follow a normal 
distribution. 
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FIG. 3: Autocorrelation of daily tweet volume for each political party. Autocorrelation coefficient Rp{t) is given by 
Rp(t) = (yp{t)-{v)){Vp(t+T)-{v )) ^ jjere {V) {{V')) is the average daily tweet volume for party p from day t = 0 

{t = t) to day t = te — 1 — t {t = te — t), dp (a'p) is the standard deviation, and te is the election day. (A) Eurol4- (B) Ralyl3. 
(C) BulgarialS. 
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FIG. 4: Normality of the logarithmic ratio rp{t) = log{Vp{t + ^)/Vp{t)) of two consecutive tweet volumes of party 

p. Cumulative distribution functions of the log ratio for each party are represented in (A) Eurol4- (C) ItalylS. (E) Bulgarial3. 
The Q-Q plots of the log ratio r{t) for each party are also represented in (B) Eurol4- (D) ItalylS. (F) BulgarialS. The 
theoretical quantile is based on normal distribution. In the Q-Q plot, if the points lie on y = x, it means the log ratio follow a 
normal distribution. 
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FIG. 5: Scatter plot of time t and log ratio rp{t) = log{Vp{t + l)/lp(t)) for each party p. Here Vp{t) is the tweet volume 
of the party p at time t. 
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FIG. 6: Dynamics of daily tweet volume for each party represented by data and by the GBM model. In 

the GBM model, the expected volume V{t) at time t is given by Vp{t) — Vp{0)exp{{iJ. — \)t). In the GBM+ct model, 

Vp{t) = Vp(0)exp{{p — %-)t + cr) while Vp{t) = Vp{0)exp{{p — ^)t — a) in the GBM—ct model. The values of parameters p, a, 
and V (0) are given in Table IIIII 
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FIG. 7: Predicted ranking determined by tweet volume I^(A) averaged from the day before the election to the 
T days before the election. I^(A) is given by Eq.[^ The numbers in parentheses represent actual rankings of the parties in 
the election. (A) Eurol4- (B) ItalylS. (C) BulgarialS. 











































